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(54) RRGS-round-robin greedy scheduling for input/output buffered terabit switches 



(57) A novel protocol for scheduling of packets in 
high-speed cell based switches is provided. The switch 
is assumed to use a logical cross-bar fabric with input 
buffers. The scheduler may be used in optical as well as 
electronic switches with terabit capacity. The proposed 
round-robin greedy scheduling (RRGS) achieves opti- 
ma! scheduling at terabit throughput, using a pipeline 
technique. The pipeline approach avoids the need for 
internal speedup of the switching fabric to achieve high 
utilization- a method for determining a time slot in a NxN 
crossbar switch for a round robin greedy scheduling 
protocol, comprising N logical queues corresponding to 
N output ports, the input for the protocol being a state of 



all the input-output queues, output of the protocol being 
a schedule, the method comprising: choosing input cor- 
responding to i = (constant-k-1) mod N, stopping if there 
are no more inputs, otherwise choosing the next input in 
a round robin fashion determined by i = (i + 1) mod N; 
choosing an output j such that a pair (i.j) to a set C= {(i.j) 
I there is at least one packet from I to j} , if the pair (i.j) 
exists; removing i from a set of inputs and repeating the 
steps if the pair (i.j) does not exist; removing i from the 
set of inputs and j from a set of outputs; and adding the 
pair (i.j) to the schedule and repeating the steps. 
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Description 

[0001] This invention relates to ten-abit switches for 
use in applications like electronic and optical media. 
Specifically, this invention relates to a round robin 
greedy scheduling algorithm. This invention is embod- 
ied in methods for scheduling and a terrabit switching 
system that implements the round robin greedy sched- 
uling algorithm. 

[0002] With the growing demand for bandwidth 
there is an increasing need for terabit switching, see M. 
Beshai, and E. Miinter, *'Mu!ti-tera-bit/s switch based on 
burst transfer and independent shared buffers," ICC'95, 
pp. 1724-1730. N. McKeown et al, "The tiny tera: a 
packet switch core," IEEE Micro, vol. 171, Jan.-Feb. 
1997. pp. 26-33, and W. D. Zhong. Y. Shimazu. M. Tsu- 
kuda, and K. Yukimatsu. "A modular Tbit/s TDM-WDM 
photonic ATM switch using optical buffers," lEICE Trans- 
actions on Communications, vol. E77-B, no. 2, February 
1994, pp. 190-196. Optical switching core (which can be 
viewed as a logical cross bar switch) with electronic 
control is an attractive candidate for high capacity 
switches. At a line rate of 10 Gb/s. a 64 byte cell/packet 
has to be processed within 40 ns. 
[0003] An important issue faced by practitioners in 
the field is how to make fast scheduling decisions that 
will use the optical core efficiently. Switch design in such 
a case may involve input buffering, output buffering or 
both. In a switch with output buffering, the output buffers 
require access speed greater than the total switch 
throughput. Alternatively, a knockout architecture is 
employed in order to decrease the required output 
buffer speed, where a limited number of cells are 
accepted by the output buffer and the rest are dropped. 
An optical knockout switch has been proposed in, 
Zhong, Y, Shimazu, M. Tsukuda, and K. Yukimatsu, "A 
modular Tbit/s TDM-WDM photonic ATM switch using 
optical buffers," lEICE Transactions on 
Communications, voL E77-B. no. 2, February 1994. pp. 
190-196. The complexity of optical knockout switching 
is high since each output requires several optical 
reverse Banyan networks and optical buffers. 
[0004] Switches with input buffering use buffers 
more efficiently, and need memory bandwidth of only 
twice the line rate. In a simple scheme with input buffer- 
ing, all inputs make requests to transmit packets that 
are at the head of their queues. If two or more inputs 
make requests for the same output, one of them is cho- 
sen' randomly. It was shown in, see M. J. Karol, M. G. 
Hluchyj, and S. P Morgan, "Input vs. output queuing on 
a space- division packet switch," IEEE Transactions on 
Communications, vol. COM-35, no. 12, December 
1 987. pp. 1 347-1 356 that input buffering algorithm leads 
to a throughput of 0,587 under uniform traffic condition. 
The efficiency further decreases in the case of non-uni- 
form traffic. In several other scheduling schemes, pack- 
ets other than the HOL (head of line) packets contend 
for output ports. See R. Fan, M. Akiyama, and Y Tan- 
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aka. "An input buffer-type ATM switching using schedule 
comparison," Electronics and Communications in 
Japan: Part /. vol. 74, no. 11. 1991, pp.17-25; S. 
Motoyama. D. W. Petr. and V. S. Frost, "Input-queued 

5 switch based on a scheduling algorithm," Electronics 
Letters, vol. 31, no. 14. July 1995, pp. 1127-1128; and 
H. Obara, "Optimum architecture for input queuing ATM 
switches," Electronics Letters, vol. 27, no. 7, March 
1991, pp. 555-557. During each time slot, an input 

to issues requests to several outputs. With just 4 requests 
per time slot, an efficiency approaching 1 is achieved. 
However, in such a scheme at high speeds multiple 
request/acknowledgements for scheduling cannot be 
processed within one time slot (where a slot represents 

15 a packet transmission time). Also, for the non-uniform 
traffic with hot-spots, the performance may degrade 
since inputs independently decide which outputs they 
are going to request. 

[0005] The switch performance can be improved if 

20 the switch controller knows the states of all input-output 
queues. Such information enables the switch controller 
to increase the number of simultaneous transmissions 
in each time slot. In the SLIP protocol, the outputs inde- 
pendently issue grants to the inputs, which leads to 

25 some inefficiency, see N. McKeown, P. Varaiya. J. Wal- 
rand, "Scheduling cells in an input-queued switch." 
Electronic Letters, vol. 29. no. 25, December 1993, pp. 
2174-2175. Better coordination between inputs is 
achieved by the algorithms discussed in, see D. Guo, Y. 

30 Yemini, Z. Zhang. "Scalable high-speed protocols for 
WDM optical star networks," IEEE INFOCOM'94. How- 
ever, these algorithms have a disadvantage that they 
require many time slots to make scheduling decisions. 
[0006] It is an objective of the present invention to 

35 solve the above-mentioned problems in the conven- 
tional technologies. Specifically it is an objective of the 
present invention to provide a method for making 
scheduling decisions in a terabit switch that will effi- 
ciently use the optical core. It is a further objective of the 

'JO present invention to provide a pipeline architecture that 
performs a round-robin greedy scheduling while provid- 
ing good performance and fulfilling stringent timing 
requirements without internal speedups. 
[0007] In order to meet the above objectives there is 

45 provided a method for determining a time slot in a NxN 
crossbar switch for a round robin greedy scheduling 
protocol, comprising N logical queues corresponding to 
N output ports, the input for the protocol being a state of 
all the input-output queues, output of the protocol being 

50 a schedule, the method comprising: choosing input cor- 
responding to i = (constant-k-1) mod N, stopping if there 
are no more inputs, otherwise choosing the next input in 
a round robin fashion determined by i = (i + 1) mod N; 
choosing an output j such that a pair (i,j) to a set C= {(i.j) 

55 I there is at least one packet from I to j} ,if the pair (i,j) 
exists removing i from the set of inputs and j from a set 
of outputs; and adding the pair (i,j) to the schedule and 
repeating the steps; removing i from a set of inputs and 
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repeating the steps if the pair (i.j) does not exist; 
Another aspect of this invention is a method of schedul- 
ing wherein in each time slot N distinct schedules are in 
progress simultaneously for N distant future time slots, 
the method comprising; making a specific future time 5 
slot available to input for scheduling in a round-robin 
fashion; selecting an output for a kth time slot in future, 
by an input i; starting a schedule for the kth time slot in 
future; determining a next input (1+1) mod N and send- 
ing to the next input remaining outputs that are free to 10 
receive packets during the kth time slot. 
[0008] Preferably if an input i is the last input to 
complete a schedule for the kth time slot, it selects an 
output (if feasible) and sends a modified output set to a 
next input; and if the input I completes a schedule for the is 
kth time slot it does not send the output set to the next 
input. 

[0009] Still preferably an input that does not receive 
a modified set of outputs from a previous input starts a 
new schedule. 

[0010] Another aspect of the present invention is a 
method of pipelined round robin greedy scheduling for 
odd number of inputs wherein an i th time slot is com- 
pleted using a process comprising: Initializing 
k(0.1)=k(1.1)=...k(N-1.1)=0. const = N+1. wherein, k(i.l) 25 
> 0 is the time slot for which input i reserves an output in 
ith time slot. i| =(const - N - 1) mod N denotes an input 
that starts a new schedule in / th time slot, and k(ij) = 0 
implies that the action of input i in time slot / is sup- 
pressed; setting 0,+m={0.1. . . ..N— 1}. k(i,./) = l+N. 30 

= k({i^^) mod A/,/— 1) for 0 < / < N— 1 and / ;t 
choosing one output j at input /, 0 ^ is ^ N — 1. 1. in a 
round-robin fashion from the set Of^O, 1) for which it has 
a packet to send, provided that k(ij) ^ 0 and excluding j 
from Of((ii).\ storing the address, at input i, 0 <, i ^ N — 35 
1. of the chosen output in its connection memory at 
location k(i, I) mod N and moving a head of line (HOL) 
packet from a corresponding receive input-output queue 
to separate transmit input-output queue; forwarding the 
set O/f^/ /; at Input /. 0 < / < A/ — 1 and / ^ (ii—2) mod N, 40 
to the next input (i 1) mod N; establishing a cross bar 
connection between the input / 0 < / < N — 1 and output 
whose address is read from location (I mod (N + 1 )) of 
the input /'s connection memory; and transmitting the 
reserved packet at the head of the scheduled transmit 45 
input-output queue i through the switch core for each 
input \,Q<i<N — 1. 

[0011] Preferably multicast scheduling is incorpo- 
rated in round-robin greedy scheduling algorithm 
wherein multicast packets are stored in a first come first so 
served fashion and have priority over unicast queues 
and steps in / th time slot further comprise: choosing all 
outputs j such that y € O /eft 1) o BM^ at Input /, 0 < / < 
/V — 1 and transmitting HOL multicast packets the cho- 
sen outputs in the k\h time slot, serving unicast queues ss 
if Oi^(ii) n BMf is an empty set, othenwise excluding the 
chosen outputs are from Of^fn) and BMj.; and deleting 
HOL multicast packets from the multicast queue If BMj 



is empty. 

[0012] Another aspect of the present invention is a 
method of pipelined round robin greedy scheduling for 
even number of inputs wherein an / th time slot is com- 
pleted using a process comprising: initializing 
k(0.1)=k(1.1)=...k(N-1.1)=0. const = N+1. wherein. k(i. I) 
> 0 is the time slot for which input i reserves an output in 
/ th time slot. i/=(const - N - 1) mod N denotes an input 
that starts a new schedule in 1th time slot, and k(i.l) = 0 
implies that the action of input i in time slot / is sup- 
pressed; setting 0-^n={0*^ N— 1}. k^j,/) = / +N+1. 

k(mod(//+/) mod N, / ) = k(i,. / -2). and k(i. /) = k{(i— 1) 
mod /V, / - 1) for 0 < / < A/ — 1 and / ^ {//. (1) + 1 ) mod N} 
and I is not equal to I-; choosing one output j at input i. 
0 ^ / ^ A/ — 1 , in a round-robin fashion from the set Off(i, 
I ) for which it has a packet to send, provided that k(i, I) 
^ 0 and excluding j from Ok(h^: storing the address, at 
input i, 0 ^ / ^ A/ — 1 . of the chosen output in its connec- 
tion memory at location k(i, I ) mod CA/ + 1) and moving 
an HOL packet from a corresponding receive input-out- 
put queue to separate transmit input-output queue: for- 
warding the set Of((i at Input /, 0 < i < N — 1 and / = 
(if — 2) mod A/, to the next input (/' + 1) mod N, wherein 
Input Oi — 2) mod N delays the set OM'i/ — 2)modN.1) 
for one time slot before fonwarding it; establishing a 
cross bar connection between the input / 0 < / < A/ — 1 
and output whose address is read from location (/ mod 
(A/ + 1)) of the input i's connection memory; transmitting 
the reserved packet at the head of the scheduled trans- 
mit input-output queue is transmitted through the switch 
core for each input /, 0 < / < N — 1 . 
[0013] Preferably multicast scheduling is incorpo- 
rated in round-robin greedy scheduling algorithm 
wherein multicast packets are stored in a first come first 
served fashion and have priority over unicast queues 
and steps in 1th time slot further comprise: choosing alt 
outputs j such that j eO ^ d BMy at input i, 0 < i < N 
— 1 and transmitting HOL multicast packets to the cho- 
sen outputs in the kih time slot; serving unicast queues 
if O^^/ /n BMj is an empty set. otherwise excluding the 
chosen outputs are from O^^/ and BMf', deleting HOL 
multicast packets from the multicast queue If BMj is 
empty. 

[0014] Another aspect of the present invention is a 
N stage pipeline system for scheduling a NxN switch 
where a stage i is associated with an input i. said stage 
i schedules transmission to an output in a future time 
slot, said future time slot rippling through all stages, 
wherein all pipeline stages corresponding to inputs are 
performing scheduling concurrently such that no two 
inputs choose a same future time slot at a same time, 
output slots being selected based on a round-robin 
fashion, wherein when an output is selected by a stage, 
the output is removed from a free pool of outputs such 
that a pipeline stage does not pick the output that has 
already been selected during a time slot. 
[0015] The above objectives and advantages of the 
present invention will become more apparent by 



3 



5 



EP 1 009 189 A2 



6 



describing in detail preferred embodiments thereof with 
reference to the attached drawings in which: 

FIG. 1 shows a tinning diagram for a prefen^ed . 
embodiment of a switch controller with N =5; 5 
FIG. 2 shows a timing diagram for a preferred 
embodiment of a switch controller with N =4; 
FIG. 3 shows an implementation of a control for a 
NxN switch; 

FIG.4 shows a comparison of average packet w 
delays for an implementation of RRGS. RGS. HOL. 
SLIP and l-TDMA, based on both analytic and sim- 
ulation results; 

FIG. 5 shows a complimentary distribution function 
of packet delay in l-TDMA. SLIP. RGS and RRGS is 
for fixed offered traffic load of (a)0.8 and (b) 0.9 
based on simulation results; and 
FIG.6 shows average packet delay under non-uni- 
form traffic load for RRGS. RGS. SLIP and l-TDMA 
for four groups of queues (a) G1 (b) G2. (c) G3 and 20 
(d) overall. 

[0016] The pipeline architecture according to the 
present invention performs a "round-robin greedy 
scheduling" (RRGS). This is a modification of and an 25 
improvement over a random greedy scheduling (RGS), 
see R. Chipalkatti, Z, Zhang, and A. S. Acampora. "Pro- 
tocols for optical star-coupler network using WDM: per- 
formance and complexity study" IEEE Journal on 
Selected Areas in Communications, vol. 11 . no. 4, May 30 
1993, pp. 579-589; and D. Guo. Y. Yemini, Z. Zhang. 
"Scalable high-speed protocols for WDM optical star 
networks." IEEE INFOCOM'94. The protocol of the 
present invention fulfills stringent timing requirements 
without any internal speedup, and at the same time pre- 35 
serves the good performance of RGS. see D. Guo. Y 
Yemini. Z. Zhang, "Scalable high-speed protocols for 
WDM optical star networks," IEEE INFOCOM'94, It 
achieves close to 100% utilization, and handles non- 
uniform traffic equally well. 40 

1. Round-Robin Greedy Scheduling (RRGS) 

[0017] A preferred embodiment is now described in 
detail. The protocol used in this invention is called a 45 
RRGS protocol. Consider an NXN cross-bar switch, 
where each input port /, / e {0, 1 N — 1}. has N log- 
ical queues, corresponding to each of the N outputs. All 
packets that are received by the switch are fixed size 
cells. An input to the RRGS protocol is the state of all so 
input-output queues. Such an input can be described by 
a set C as follows: 

C = {(ij) I there is at least one packet at input / for 

output j}, 55 

[0018] The output of the protocol is a schedule 
associating the inputs to the outputs. Such a set S can 



be described as follows: 

S ~ {(iJ) I packet will be sent from input i to output j}. 

[0019] It will be clear to a skilled artisan that in each 
time slot, an input can transmit only one packet, and an 
output can receive only one packet. Under this condi- 
tion, a schedule for an arbitrary kiU time slot is deter- 
mined as follows: 

Step 1) /jtf = {0.1 N — 1} is a set of all inputs. 

Oj[f={0.1 N — 1} is a set of all outputs. 

Select / = (const — k — 1 ) mod N. Such a 
choice of an input that starts a schedule will 
enable a simple implementation. 

Step 2) If lf( is empty, stop. Otherwise choose the 
next input i in a round-robin fashion accord- 
ing to i = + 1 ) mod N. 

Step 3) Choose in a round-robin fashion .the output 
j from such that (iJ) 0 Cf^. If such an out- 
put does not exist then remove / from If^ and 
go to step 2. 

Step 4) Remove input / from If^, and output j from 
Of^. Add (iJ) to S^. Go to Step 2. 

[0020] The above protocol is clearly an improve- 
ment over the conventional RGS. See D. Guo, Y Yem- 
ini. Z. Zhang. "Scalable high-speed protocols for WDM 
optical star networks," IEEE INFOCOM'94. Also see the 
DAS algorithm described in R. Chipalkatti, Z. Zhang, 
and A. S. Acampora, "Protocols for optical star-coupler 
network using WDM: performance and complexity 
study." IEEE Journal on Selected Areas in 
Communications, vol. 11. no. 4, May 1993, pp. 579-589. 
In the conventional RGS, both input and the corre- 
sponding matching output are chosen randomly. How- 
ever, implementation of such a random selection 
scheme is difficult in practice. Note that in each time 
slot, N packets can be transferred from the N inputs to 
the N outputs. 

[0021] The process of scheduling a given time slot 
in RRGS thus consists of N phases. In each phase of a 
given time slot (in the future), one input chooses one of 
the remaining outputs for transmission during that time 
slot. A phase consists of a request from the input mod- 
ule (IM) to the round-robin (RR) arbiter. RR selection, 
and acknowledgement from the RR arbiter to the IM. 
The round-robin order in which inputs choose outputs 
shifts cyclically at each time slot so that it ensures equal 
access for all inputs. 

2. Pipelined RRGS for Odd Number of Inputs 

[0022] At high link speeds (e.g. lOGb/s). the N 
phases cannot be completed within one time slot (40ns 
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assuming packet size of 64 bytes). With increasing link 
speeds, using conventional technologies no more than 
one phase can be completed in one time slot. To over- 
come this problem, the present invention uses a pipe- 
line approach, where in each time slot. N distinct 
schedules are in progress simultaneously, for N distinct 
time slots in the future. Each phase of a particular 
schedule involves just one input. In any given time slot, 
other inputs simultaneously perform the phases of 
schedules for other distinct future time slots. 
[0023] Definition: A schedule for a future time slot 
is said to be completed when all N phases are com- 
pleted, i.e. when all inputs have had a chance to choose 
(successfully or otherwise) an output for transmission 
during T^. 

[0024] While N time slots are required to complete 
the N phases of a given schedule. N phases of N differ- 
ent schedules may be completed within one time slot 
using a pipeline approach, by computing the N sched- 
ules in parallel. But this is effectively equivalent to the 
completion of one schedule every time slot. In RRGS. a 
specific future time slot is made available to the inputs 
for scheduling in a round-robin fashion. An input / that 
starts a schedule for the fcth time slot (in the future), 
chooses an output in a round robin (RR) fashion, and 
sends to the next input (i + 1) mod N, the set that 
indicates the remaining output ports that are still free to 
receive packets during the kih time slot. Any input / that 
receives from the previous input (i — 1) mod N, set 
of available outputs for the kih time slot, chooses one 
output if possible from this set. and sends to the next 
input f/ + 1) mod N the modified set O^, if input / did not 
complete the schedule for the /cth time slot, Tf^. An input 
/ that completes a schedule for the kth time slot should 
not fonward the modified set Of, to the next input (i + ^) 
mod N. Thus input (i + ^) mod N which did not receive 
the set Of( in the current time slot, will be starting a new 
schedule (for a new time slot) in the next time slot. Step 
1 of RRGS implies that an input refrains from forwarding 
the set Of, once in N time slots. Input / that does not for- 
ward the set Of, should be the last one that chooses an 
output for the kih time slot. 

[0025] Theorem 1: If input (const — k) mod N 
refrains from forwarding the set O^^^in the (k — 1)th time 
slot, and the number of inputs N. is odd. then input 
(const — k) mod N completes the schedule for the /cth 
time slot. 

[0026] Proof: The above theorem implies that: 

In every time slot, all N inputs will have an opportu- 
nity to schedule transmission for a future time slot. 

In each time slot, an input can schedule a transmis- 
sion for no more than one future time slot. 

In each time slot, an output can be scheduled to 
receive transmission from only one input. 
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[0027] Fix input / = (const — k) mod N that refrains 
from forwarding the set Ok in the (k — 1)th time slot. In 
such a case, each of the previous N — 1 inputs must 
foHA^ard set when it makes a reservation for the /cth 
time slot. Note that in the (k—^—i) the time slot, an 
input {i + j) mod N does not forward set O/ to the next 
input. Also, input (/ — j) mod N makes a resen/ation for 
the /cth time slot. Such a schedule is feasible if: 



v(1 <y <(/V-1))i + j nnod N < 
v(1 <y<(/V-1))2.y;tOmod/Vc 
N is odd number 



20 



25 



30 



35 



40 



45 



50 



55 



[0028] Schedule for the /cth time slot was started by 
the input (/ + 1) mod N in the {k — N)th time slot, since 
the input / did not forward set Ok-N in the (k — N — 1)th 
time slot. 

[0029] The timing diagram for an embodiment with 
a 5X5 switch is shown in Figure 1 . This figure shows the 
relation between inputs and the time slots for which they 
are choosing their outputs. For example, in time slot T5, 
input I is scheduling or choosing an output for transmis- 
sion during time slot T^q while I3 is scheduling for T9 and 
so on. In the next time slot Jq, Ii is scheduling for Tg and 
so on. A Bold vertical line denotes that the previous 
input completed a schedule, and the next input will start 
a new schedule. An input does not foHA^ard the set 0 to 
the next input if it is the last one to choose an output for 
the associated time slot. Since this condition ocxurs 
once per N = 5 time slots, an input makes a decision not 
to forward the set 0, by means of a modulo N counter. 
[0030] Finally, the actions taken by RRGS in the th 
time slot (for example, the current time slot). denotes 
a set of available outputs for the k th time slot. Let. k(i, / 
) > 0 denote the time slot for which input / reserves an 
output in the / th time slot, and i / (const — N — 1) mod 
N denotes the input that starts a new schedule in the /- 
th time slot. Also k{i, I) = 0 implies that the action of 
input / in time slot / is suppressed. The scheduler 
requires proper initialization. This initialization period 
lasts for N time slots. Assume that the initialization proc- 
ess commences in the first time slot T-,. The initialization 
process is started by selling k(0, 1) = /cf1 , 1) = . . . = /c^W 
- 1,1) = 0. const = /V + 1. That is, the actions of all the 
inputs are suppressed for the first N time slots, unless 
modified subsequently. It is also assumed that packets 
are queued at the input in logically separate queues 
called input-output queues, with one queue for each 
output port to prevent HOL blocking. Further, receive 
input-output queues and transmit input-output queues 
are also provided, 

• O^.f^{0^ N-1} 
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• k((if, l)^i'f'N, and k(i, I ) = k((i — 1) mod A/, / — 1) 
for 0 < / < N — 1 and / * if. 

Input /, 0 < / < N — 1 , chooses one output j in a RR 
fashion fronn the set Of^^i ^ for which it has a packet 
to send, provided that k(i, 1)^0. (Input h 0 ^ / 
— 1 and I ^ if, received O^^//; from the input (/ — 1) 
nnod N in the previous time slot.) Output J is - 
excluded from Of^^j fy 

Input /, 0<i<N — 1 . stores the address of the cho- 
sen output in its connection memory at location k(i, 
i ) mod N. The head of line packet from the corre- 
sponding receive input output queue is moved to 
the separate transmit input-output queue. 

Input /. 0 < i < N — 1. and i ^ (if — 2) mod N, for- 
wards the set Of^(fi) to the next input (i + ^) mod N, 
Note that just N bits of information need be for- 
warded. 

A cross bar connection is established between the 
input /, 0 ^ / ^ A/ — 1; and the output whose 
address is read from location (/ mod N) of input /"s 
connection memory. 

For each input i, 0 < i < N — 1 . the HOL packet of 
the scheduled transmit input-output queue is trans- 
mitted through the switch core. 

3. Pipelined RRGS for Even Number of Inputs 

[0031] In the scheme for odd number of inputs, 
each input refrains from forwarding the modified set Of^ 
to Its next neighbor, if the input was the last one to 
choose an output for a future time slot, and thus com- 
pleted the schedule for that time slot. A direct applica- 
tion of such an algorithm (developed for an odd value of 
N), will result in some inputs scheduling for more than 
one future time slot, while other inputs do not schedule 
at all. Thus to control a switch with an even number of 
inputs, the pipeline technique is modified. 
[0032] Proof of the Theorem 1 infers that delaying 
instead of blocking the control information would imply 
even number of inputs. For the case with even number 
of inputs, each input refrains from fonft^arding the set O/ 
to the next input once in N time slots, and, in the next 
time slot the input forwards the delayed set Of from the 
previous time slot. When input / forwards the delayed 
set O, it will not forward current set Of^. Therefore, input 
7 should be the last one that chooses an output for the 
Mh time slot. 

[0033] Theorem 2: If Input (const — k) mod N 
delays the set O/ in the (k — 2)th time slot (and forwards 
it in the (k — 1)th time slot), and the number of inputs N, 
is even, then input (const — k) mod N completes the 
schedule for the kth time slot. 

[0034] Proof: Fix the input / (const — k) mod N that 



delays set Olin the (k-2)th time slot, and forwards the 
delayed set Of instead of Off in the (k — 1)th time slot. 
So. in the (k — 1 -^)tti time slot, input (i +J — 1) mod N 
delays set-O^, and input (i -t-j) mod N forwards delayed 

5 set On. In the (k — 1 -^)tti time slot, input (f—j) mod N 
makes a reservation for the k\h time slot and fonwards 
provided that / — j^i + j mod N and i—j ^ i -f- j — 1 
mod N which is true for <j< N/2 — 1 and N even. Input 
(i — N/2) mod N stores Of^ during the C/r — 1 — N/2)th 

JO time slot so that no input reserves the k\X\ time slot. In 
the (k~~2 — N/2)th time slot, input (i — N/2) makes a 
reservation for the /cth time slot. In the (k — 1 — j)th 
time slot. N/2 + 2<j< input (/ — j + 1) mod N makes 
a reservation for the kth time slot and forwards O^, pro- 

75 vided that / — j ^ i + j mod N and i - j ^ ^ i + j - ^ 
mod N which is true for N even. 

[0035] So. a schedule for the /cth time slot 
progresses through the pipeline without being Inter- 
rupted before it is completed by user /. This schedule 

20 was started by the user f/ + 1) mod N iri the (k — N — 
1)th time slot, since the input / delayed the control infor- 
mation in the {k — N — 2)th time slot. 
[0036] The timing diagram for an embodiment with 
a 4X4 switch is shown in Figure 2 to illustrate the case 

25 with even number of inputs. Shaded rectangle denotes 
a delaying of the control information 0. Bold vertical line 
denotes that the previous Input completed a schedule, 
and the next input will start a new schedule, 
[0037] . Again, the actions that the RRGS takes in 

30 the /th time slot for even number of inputs are specified. 
Of^ denotes a set of available outputs for the /cth time 
slot. Let. k(i, I) > 0 denote the time slot for which input / 
reserves an output in the / th time slot, and // — (const 
— N — 1 — /; mod N denotes the input that starts a 

35 new schedule in the /th time slot. Let k(i, I }= 0 imply that 
the action of input / in time slot / is suppressed. The 
scheduler requires proper initialization. This initializa- 
tion period lasts for N time slots. Assume that the initial- 
ization process commences in the first time slot T^. The 

40 initialization process is started by setting k{0. 1) = /c(1. 
^) = _ = k(N ~ 1.1) 0. const = A/ + 2. That is. the 
actions of all the inputs are suppressed for the first N 
time slots, unless modified subsequently. 

45 • O f^f^^j={0,1 N—1} 

• k(if ,l) = l k(mo6{(if^^ ) mod N, I) = k((if, 1—2), 

and k(i, I) = k((i—l) mod N, /—I) for 0 < i < N—1 
and / ^ {(/{'('i-^ ^) 

50 

Input /, O^i^N — 1 , chooses one output j in a RR 
fashion from the set Of^^ fj for which it has a packet 
to send, provided that k(i, /; 0. (Input / 0 < / < A/ — 
1 and / 9t if^ received set Of^fj^ /; from the input (i - ^) 
55 mod N in the previous time slot.) Output j is 
excluded from set Of^^j fy 

Input i, 0^i<,N — 1 , stores the address of the cho- 
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sen output in its connection memory at location k(i, 
I) mod + 1). HOL packet from the corresponding 
receive input-output queue is moved to the sepa- 
rate transmit input-output queue. 

Input /. 0 < / < /V — 1 and i = (i/ — 2) mod N, for- 
wards the set Ok(U)^o the next input C/ + 1) mod N, 
Input (Oi - 2) mod N delays the set O __2)mo6N. i) 
for one time slot before forwarding it. 

A cross bar connection is established between the 
input i, 0 < i < N — 1 and the output 4 whose 
address is read from location (I mod (N + ^)) of the 
input i's connection memory. 

For each input i, 0<'\< N — 1 , the reserved packet 
at the head of the scheduled transmit input-output 
queue is transmitted through the switch core. 

4. Multicast Scheduling 

[0038] Another aspect of this invention is the incor- 
poration of a multicast function. Multicast packets are 
stored in a separate queue which is served in first come 
first served (FCFS fashion). Each queue has a multicast 
bit map (BM) which denotes the destinations of its HOL 
packet. In the simplest version, the multicast queue will 
have priority over the unicast queues. Additional multi- 
cast actions that are taken in the Ith time slot are as fol- 
lows. 

Input i, 0 < i < N — 1, chooses all outputs j such 
that ; e Of^^j^ r\ BM,. HOL multicast packet will be 
transmitted to the chosen outputs in the /cth time 
slot. 

If Oj^f^, /; r-, BMi is an empty set. the unicast queues 
are served. Otherwise, the chosen outputs are 
excluded from Oi^(ij) and 6/W/. 
If BMf is empty, the HOL multicast packets is 
deleted from the multicast queue. 

5. Implementation of the Switch Controller 

(0039] Controller for an NXN optical cross bar 
switch is shown in Figure 3. Each input requires an input 
module (3.11 .3.21. 3.31. ..3. N1). a RR arbiter and pipe- 
line controller (3. 12, 3. 22. 3,32, ..3. N2) . and a connection 
memory (3.13.3.23,3.33... 3.N3). The input module (IM) 
stores incoming packets in logically separate receive 
queues, with each queue destined for a particular out- 
put The IM sends requests to its associated RR arbiter. 
The RR arbiter chooses one of the free outputs in a 
future time slot, and informs the corresponding input 
and the pipeline controller about this choice. Note that 
the initialization procedure for the pipeline ensures that 
in any given time slot, no two arbiters pick the same 
future time slot for scheduling transmission. The input 
module stores the successful packet in a separate 



transmit input-output queue. The RR arbiter also writes 
its scheduling decision into a specified memory location 
of the associated connection memory The location in 
memory is determined by the time slot for which the 
5 packet is scheduled. The pipeline controller informs the 
RR arbiter of the next input, all the outputs that have not 
been reserved for a particular time slot; rhore precisely, 
it inhibits requests for these reserved outputs. If some 
input does not forward the control information, its pipe- 
line controller allows the RR arbiter of the next input to 
be able to choose any output for a future time slot. 
[0040] Based on the schedule written in the con- 
nection memory, packets from the input modules are 
transferred to output modules via the switch core. 

15 

6. PERFORMANCE COMPARISON 

[0041] In this section. iRRGS is compared with 
other protocols of similar complexity. Complexity is 
20 measured by the time that the protocol requires to com- 
plete one schedule. HOL. 1-TDMA, SLIP. RGS and 
RRGS protocols are compared. See K. Bogineni. K. M. 
Sivilingam, and P. W. Dowd. "Low-complexity multiple 
access protocols for wavelength-division multiplexed 
25 photonic networks." IEEE Journal on Selected Areas in 
Communications, vol. 11. no. 4. May 1993. pp. 590-604; 
D. Guo. Y. Yemini. Z. Zhang. "Scalable high-speed pro- 
tocols for WDM optical star networks." IEEE 
INFOCOM'94: M. J. Karol. M. G. Hluchyj, and 8. P. Mor- 
30 gan, "Input vs. output queuing on a space- division 
packet switch." IEEE Transactions on Communications. 
vol. COM-35. no. 12. December 1987, pp. 1347-1356; 
and N. McKeown. R Varaiya. J. Walrand. "Scheduling 
cells in an input-queued switch," Electronic Letters, vol. 
35 29, no. 25. December 1993, pp. 2174-2175. 

[0042] The head of line (HOL) protocol denotes the 
simplest protocol for the switch with input queuing, see 
M. J. Karol, M. G. Hluchyj. and S. P. .Morgan. "Input vs. 
output queuing on a space- division packet switch." 
40 IEEE Transactions on Communications, vol. COM-35. 
no. 12, December 1987. pp. 1347-1356. Each input 
sends a request for transmission of the HOL packet, to 
the appropriate outpuL The requested output issues a 
grant to one of the inputs in a round-robin fashion. In the 
45 next time slot, the granted inputs send packets to the 
corresponding outputs. 

[0043] In interleaved TDM A (1-TDMA) outputs are 
assigned to the inputs in a fixed manner, see K. 
Bogineni. K. M. Sivilingam, and P. W. Dowd. "Low-corn- 

50 plexity multiple access protocols for wavelength-division 
multiplexed photonic networks," IEEE Journal on 
Selected Areas in Communications, vol. 11, no. 4, May 
1993, pp. 590-604. Time is divded into frames. The 
transmission schedule is predetermined in each time 

55 slot of the frame. Packets are stored in separate queues 
according to their destinations, and transmitted in their 
scheduled time slots. 

[0044] Iterative round-robin matching with slip 
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(SLIP) has been proposed in, see N. McKeown, P. Var- 
aiya. J. Walrand. "Scheduling cells in an input-queued 
switch." Electronic Letters, vol. 29, no. 25, December 
1993, pp. 2174-2175. Each input sends requests to all 
outputs for which it has packets to send. A requested 5 
output issues a grant to one of the requesting inputs, in 
a round-robin fashion. Inputs that receive multiple 
grants, choose one of the permitted outputs in a round- 
robin fashion. Each round-robin choice starts from the 
position after the last chosen candidate. fo 
[0045] Random greedy scheduling (RGS) is similar 
to RRGS except that the round-robin choice is replaced 
by random choice, see D. Guo. Y. Yemini. Z. Zhang, 
"Scalable high-speed protocols for WDM optical star 
networks," IEEE INFOCOM'94. The controller randomly is 
chooses a sequence of inputs, and randomly matches 
them to unmatched outputs. At high speeds, the RGS 
cannot be completed within one time slot; however, we 
assess its performance to investigate the effect of 
replacing the random choice by a round-robin choice. 20 
[0046] In FIG. 4 the average packet delay versus the 
offered load for HOL. I-TDMA,SL1R RGS and RRGS are 
plotted. The analytical performance results obtained 
agrees well with simulation results. 

[0047] FIG. 5 shows the complementary distribution 2S 
function of packet delay in l-TDMA.SLIP, RGS and 
RRGS for fixed offered traffic. The plotted curves are 
obtained using simulation results. RRGS significantly 
outperforms l-TDMA and SLIP for most traffic loads. 
[0048] FIG. 6 shows protocol performance for non- 30 
uniform traffic for a specific traffic matrix for four different 
loads G^. G2, G3and G4. 

7. CONCLUSION 

35 

[0049] The present invention proposes a pipelined 
round robin scheduler for fast input buffered packet 
switches. The RRGS protocol of the present invention 
provides shorter average packet delay than other proto- 
cols of comparable complexity. Packet delay distribution 40 
does not exhibit heavy tail. Under non-uniform traffic 
loading, lightly loaded queues experience longer delay, 
but, heavily loaded queues experience delays that are 
significantly lower, compared to delays in other proto- 
cols. 45 
[0050] Other modifications and variations to the 
invention will be apparent to those skilled in the art from 
the foregoing disclosure and teachings. Thus, while only 
certain embodiments of the invention have been specif- 
ically described herein, it will be apparent that numer- so 
ous modifications may be made thereto without 
departing from the spirit and scope of the invention. 

Claims 

55 

1. A method for determining a time slot in a NxN 
crossbar switch for a round robin greedy scheduling 
protocol, comprising N logical queues at each 



input, corresponding to N output ports, the input for 
the protocol being a state of all the input-output 
queues, output of the protocol being a schedule, 
the method comprising: 

a) choosing input corresponding to i = (con- 
stant-k-1) mod N; 

b) stopping if there are no more inputs, other- 
wise choosing the next input in a round robin 
fashion determined by i = (i + 1) mod N; 

c) choosing an output] such that a pair (i.j) to a 
set C= {(i.j) I there is at least one packet from I 
to j} , if the pair (i.j) exists; 

d) removing i from a set of inputs and going to 
step b if the pair (i.j) does not exist in step c: 

e) removing i from the set of inputs and j from a 
set of outputs; and 

f) adding the pair (i,j) to the schedule and going 
to step b. 

2. A method of scheduling wherein in each time slot N 
distinct schedules are in progress simultaneously 
for N distant future time slots, the method compris- 
ing; 

a) making a specific future time slot available to 
input for scheduling in a round-robin fashion: 

b) selecting an output for a kth time slot in 
future, by an input i; 

c) starting a schedule for the kth time slot in 
future; 

d) determining a next input {i+1) mod N and 
sending to the next input remaining outputs 
that are free to receive packets during the kth 
time slot. 

3. The method of claim 2 wherein if an input 1 did not 
complete a schedule for the kth time slot selects an 
output and sends a modified output set to a next 
input ; and if the input I completes a schedule for 
the kth time slot it does not send the output set to 
the next input. 

4. The method of claim 3 wherein an input that does 
not receive a modified set of outputs from a previ- 
ous input starts a new schedule. 

5. A method of pipelined round robin greedy schedul- 
ing for odd number of inputs wherein an Ith time slot 
is completed using a process comprising: 

e) Initializing k(0.1)=k(1.1)=...k(N-1.1)=0. const 
= N+1, wherein. k(i. /) > 0 is the time slot for 
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which input i reserves an output in llh tinne slot. 
i,=(con$t - N - 1) mod N denotes an input that 
starts a new schedule in 1th tinne slot, and k(i. 
I) = 0 implies that the action of input i in time 
slot / is suppressed: 5 

f) selling O,+n={0.1 N— 1}, k(i/. I) = l+N. k(i, 

I ) = 1) mod /s/, / — i) for 0 < /■ < A/— 1 and 

/ ^ //; 

g) choosing one output J at input /. 0 ^ / ^ A/ - 

1. in a round-robin fashion from the set 0^(1, y io 
for which it has a packet to send, provided that 
k(i. 1)^0 and excluding j from Of^^f 

h) storing the address, at input i, Q<i<N — 1 . 
of the chosen output in its connection memory 

at location k(i. I) mod N and moving a head of 15 
tine (HOL) packet from a corresponding receive 
input-output queue to separate transmit input- 
output queue; 

i) forwarding the set Of^fij) at Input i, 0 < i < N 

— 1 and i ft— 2) mod N,[o the next input (i + 20 
1 ) mod A/; 

j) establishing a cross bar connection between 
the input i 0 < i < N — 1 and output whose 
address is read from location (I mod (N + ^)) of 
the input i's connection memory; and 25 
k) transmitting the reserved packet at the head 
of the scheduled transmit input-output queue i 
through the switch core for each input /. 0 < / < 
N — ^. 

30 

6. A method of pipelined round robin greedy schedul- 
ing for even number of inputs wherein an Ith time 
slot is completed using a process comprising: 

/; Initializing k(0.1)=k(1.1)=.,.k(N-1.1)=0. const 35 
= N+1, wherein. k(i,/; > 0 is the time slot for 
which input i reserves an output in / th time slot, 
i,=(const - N - 1) mod N denotes an input that 
starts a new schedule in / th time slot, and k(i. 
0 = 0 implies that the action of input i in time 40 
slot / is suppressed: 

m) setting O,+n={0.1 N— 1}. k(i,, /) = /+N+1 . 

k(mod(//+1) mod N, /) = /— 2), and k(i. f) = 
k((i_1) mod N, /— 1) for 0 < / < N — 1 and(i/+ 
1) mod N) and / Wi , ft + 1) mod N}: 45 
n) choosing one output j at input /, 0 ^ / ^ N — 
1, in a round-robin fashion from the set Of^fj^ y 
for which it has a packet to send, provided that 
k(i, 1 ) ?t 0 and excluding j from O/^^,- 

0) storing the address, at input /. 0 ^ / ^ A/ — 1 . so 
of the chosen output in its connection memory 

at location k(h I) mod (N -t- ^) and moving an 
HOL packet from a corresponding receive 
input-output queue to separate transmit input- 
output queue; 55 
p) forwarding the set O^^,,/; at Input i,0 < i < N 
— 1 and /= ft — 2) mod N,io the next input (i + 

1) mod N, wherein Input ft — 2) mod N delays 
the set O^^^, ._2)modN.i) ^or one time slot before 



forwarding it: 

q) Establishing a cross bar connection between 
the input i. 0 < i < N — 1 and output whose 
address is read from location (I mod (N ^)) of 
the input i's connection memory: 
r) transmitting the reserved packet at the head 
of the scheduled transmit input-output queue is 
transmitted through the switch core for each 
input i, 0<i< N — 1. 

7. A method according to claim 5 wherein multicast 
scheduling is incorporated in round-robin greedy 
scheduling algorithm wherein multicast packets are 
stored in a first come first served fashion and have 
priority over unicast queues and steps in Ith time 
slot further comprise: 

s) choosing all outputs j such that; e O f^ff j) o 
BMj at input /, 0 < i < N — 1 and transmitting 
HOL multicast packets the chosen outputs in 
the klh time slot 

t) serving unicast queues if Oj^^/, j) o BMj is an 
empty set. otherwise excluding the chosen out- 
puts from Offfi and BMj. 
u) deleting HOL multicast packets from the 
multicast queue If BMj is empty. 

8. A method according to claim 6 wherein multicast 
scheduling is incorporated in round-robin greedy 
scheduling algorithm wherein multicast packets are 
stored in a first come first served fashion and have 
priority over unicast queues and steps in Ith time 
slot further comprise: 

v) chooses all outputs j such that / e O f^^i) o 
BMj at input i, 0 < i < N — 1 and 
w) transmitting HOL multicast packets the cho- 
sen outputs in the /cth time slot, 
x) serving unicast queues if Ok(i,i} ^ an 
empty set, otherwise excluding the chosen out- 
. puts from Of((i f) and BM,. 

y) deleting HOL multicast packets from the mul- 
ticast queue If SM, is empty. 

9. An N stage pipeline system for scheduling a NxN 
switch where a stage i is associated with an input I. 
said stage I schedules transmission to an output in 
a future time slot, said future time slot rippling 
through all stages, wherein all pipeline stages cor- 
responding to inputs are performing scheduling 
concurrently such that no two inputs choose a 
same future time slot at a same time, output slots 
being selected based on a round-robin fashion, 
wherein when an output is selected by a stage, the 
output is removed from a free pool of outputs such 
that a pipeline stage does not pick the output that 
has already been selected during a time slot by 
inputs. 
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FIG.6A 



FIG.6B 
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